In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
# Load pickled data
import pickle
# TODO: Fill this in based on where you saved the training and testing data
training_file = 'samples/train.p'
testing_file = 'samples/test.p'
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
The pickled data is a dictionary with 4 key/value pairs:
'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).'labels' is a 2D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGESComplete the basic data summary below.
Based on the above understanding of the dataset, below further load sizes, coodrs and the mapping between label id to its description.
sizes_train, coords_train = train['sizes'], train['coords']
sizes_test, coords_test = test['sizes'], test['coords']
import csv
with open('signnames.csv', newline='') as f:
reader = csv.DictReader(f)
sign_id_to_name = {int(row['ClassId']): row['SignName'] for row in reader}
The first dimension of features should be the count of samples, while the second, the third are the width, and height of image, respectively. The labels, sizes, coods have the same dimension as the corresponding features. They are associated with the corresponding features.
Sign_id_to_name provides mapping to all the signs, thus it's cardinality is the number of signs.
import numpy as np
### Replace each question mark with the appropriate value.
# TODO: Number of training examples
n_train = np.shape(X_train)[0]
# TODO: Number of testing examples.
n_test = np.shape(X_test)[0]
# TODO: What's the shape of an traffic sign image?
image_shape = np.shape(X_train)[1:3]
# TODO: How many unique classes/labels there are in the dataset.
n_classes = len(sign_id_to_name)
input_depth = 3
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.
The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.
NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import matplotlib.pyplot as plt
# Visualizations will be shown in the notebook.
%matplotlib inline
The following routines mechanize the display of all kinds of traffic sign samples.
def plot_sign(image, annotations):
plt.title(annotations['title'])
plt.ylabel(str.format('Original size: {}', annotations['size_original']))
plt.xlabel(str.format('Coordinates at the original: {}', annotations['position_original']))
plt.imshow(image)
def sign_indices(labels, sign_map):
indices = []
for sign_id in sign_map.keys():
index = -1
for j in range(index+1, len(labels)):
if labels[j] == sign_id:
index = j
indices.append(index)
break
return indices
Show the sample of all kinds of traffic signs in the training and testing sets, in order to understand the visual appearance of the samples:
Features = {}
Features['train'] = X_train
Features['test'] = X_test
Labels = {}
Labels['train'] = y_train
Labels['test'] = y_test
Sizes = {}
Sizes['train'] = sizes_train
Sizes['test'] = sizes_test
Coords = {}
Coords['train'] = coords_train
Coords['test'] = coords_test
indices_train, indices_test = [sign_indices(Labels[t], sign_id_to_name) for t in ['train', 'test']]
def sample_images(features, labels, sizes, coords, kind='train'):
indices = sign_indices(Labels[kind], sign_id_to_name)
images_and_annotations = [[Features[kind][id],
{'title': sign_id_to_name[Labels[kind][id]],
'size_original': Sizes[kind][id],
'position_original': Coords[kind][id]}] for id in indices]
return images_and_annotations
i_and_a_train, i_and_a_test = [sample_images(Features[t], Labels[t], Sizes[t], Coords[t], kind = t)
for t in ['train', 'test']]
k = max(len(i_and_a_train), len(i_and_a_test))
Below a full set of traffic sign samples from both the training and testing set. The left is from the training, and the right from the testing.
f,axarr=plt.subplots(nrows=k, ncols=2, figsize=(11,k*3), sharex=True, sharey=True)
plt.subplots_adjust(wspace = 0.5, hspace = 0.7)
for i in range(k):
image, annotations = i_and_a_train[i]
plt.subplot(k, 2, i*2+1)
plot_sign(image, annotations)
image, annotations = i_and_a_test[i]
plt.subplot(k, 2, (i+1)*2)
plot_sign(image, annotations)
hist, bin_edgs = np.histogram(y_train, bins=np.arange(n_classes + 1 ))
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Preprocess the data here.
### Feel free to use as many code cells as needed.
from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train)
Describe how you preprocessed the data. Why did you choose that technique?
Answer:
Here are some potential ways to preprocess the sample data.
Normalize the sample data to have 0 mean, and proper deviation.
The sequence of the samples is reshuffled to make it random.
Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size = 0.2, random_state = 0)
Answer:
Split the train data to have validation data.
Discussion on varying the size of kernel/filter, it seems that LeNet is optimized for digit recognition, where the figure is much more dominant than the traffic sign. So maybe, the kernel size 5 may not be optimal for traffic sign? Increasing it might help to classifier to focus on bigger features, thus more robust?
Try kernel size 6 from 5, cross-validation accuracy increases to 0.987.
Try kernel size 7, epochs 500
### Desfine your architecture here.
### Feel free to use as many code cells as needed.
import tensorflow as tf
from tensorflow.contrib.layers import flatten
keep_prob = tf.placeholder(tf.float32) # probability to keep units
def LeNet(x):
# Hyperparameters
mu = 0
sigma = 0.1
# Layer 0: Color adaptation, convolutional. Input = 32x32x3, Output 32x32x3
conv0_W = tf.Variable(tf.truncated_normal(shape=(1, 1, input_depth, input_depth), mean = mu, stddev = sigma))
conv0_b = tf.Variable(tf.zeros(input_depth))
conv0 = tf.nn.conv2d(x, conv0_W, strides=[1, 1, 1, 1], padding='VALID') + conv0_b
# SOLUTION: Activation.
conv0 = tf.nn.relu(conv0)
# SOLUTION: Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x6.
kernel1_size = 3 # 5, 6, 7, 4
conv1_W = tf.Variable(tf.truncated_normal(shape=(kernel1_size, kernel1_size, input_depth, 6), mean = mu, stddev = sigma))
conv1_b = tf.Variable(tf.zeros(6))
conv1 = tf.nn.conv2d(conv0, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
# SOLUTION: Activation.
conv1 = tf.nn.relu(conv1)
# SOLUTION: Pooling. Input = 28x28x6. Output = 14x14x6.
conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Layer 2: Convolutional. Output = 10x10x16.
conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
conv2_b = tf.Variable(tf.zeros(16))
conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# SOLUTION: Activation.
conv2 = tf.nn.relu(conv2)
# SOLUTION: Pooling. Input = 10x10x16. Output = 5x5x16.
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Flatten. Input = 5x5x16. Output = 400.
fc0 = flatten(conv2)
fc0_dim = fc0.get_shape().as_list()[1]
# SOLUTION: Layer 3: Fully Connected. Input = 400. Output = 120.
fc1_W = tf.Variable(tf.truncated_normal(shape=(fc0_dim, 120), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(120))
fc1 = tf.matmul(fc0, fc1_W) + fc1_b
# SOLUTION: Activation.
fc1 = tf.nn.relu(fc1)
# Drop-out.
fc1 = tf.nn.dropout(fc1, keep_prob)
# SOLUTION: Layer 4: Fully Connected. Input = 120. Output = 84.
fc2_W = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(84))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
# SOLUTION: Activation.
fc2 = tf.nn.relu(fc2)
# Drop-out.
# fc2 = tf.nn.dropout(fc2, keep_prob)
# SOLUTION: Layer 5: Fully Connected. Input = 84. Output = 10.
fc3_W = tf.Variable(tf.truncated_normal(shape=(84, n_classes), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(n_classes))
logits = tf.matmul(fc2, fc3_W) + fc3_b
return logits
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer: Start with LeNet. Adapt it to the required dimensions, without preprocessing the input.
Need to deal with input of R, G, B, so the input depth should be 3.
The final output should be the number of classes must be adapted to be n_classes, which is defined as n_classes = len(sign_id_to_name)
All the rest remains up to this moment.
With the minimum change to LeNet, and the raw input without pre-processing, the learning curves show very slow progress, and seems remain very low accuracies for both training and cross-validation. This may suggest that the network may suffer from under-fitting, that the input is too complex to learn. The next action is to do more pre-processing to make the input less challenging.
The output of the first layer, the depth may need to be increased to account for much more complex features. The original was 6. I'll set it to be 108 based on my understanding of LeCunn's paper.
### Train your model here.
### Feel free to use as many code cells as needed.
Try to increase the BATCH_SIZE from 128 to 10000, as I still have much memory available, 8GB, with BATCH_SIZE 128.
Also try to increase the EPOCHS from 10, as it progresses slowly.
After reading https://carnd-forums.udacity.com/questions/32112911/increasing-batch-size-results-in-failure-to-converge.-really-strange I realized that by increasing BATCH_SIZE from 128 to 10000, I have reduced the training times by about 100 times, so I must increase EPOCHS by about 100 times to be equivalent in terms of trainings. So I did experiement with
BATCH_SIZE = 10000 EPOCHS = 10*(math.floor(BATCH_SIZE/128))
Upon observation, it indeed solved the problem not making progress, and it seems that EPOCH = 250 to 300 would be sufficient, as the training accuracy reaches 1.00 at 248.
Further increase the BATCH_SIZE from 10000 to 40000, and update the EPOCHS to 1600 to be compatible.
Change BATCH_SIZE to 5000, and EPOCHS to 200.
Change BATCH_SIZE to 10000, and EPOCHS to 400
import math
BATCH_SIZE = 10000
EPOCHS = 500
10*(math.floor(BATCH_SIZE/128))
x = tf.placeholder(tf.float32, (None, 32, 32, input_depth))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, n_classes)
Try to increase the learning rate, as it seems at the 0.001, the performance starts poorly, except the first time learning, and always progress very slowly.
With learning rate set 0.01, all the other conditions being the same as above, the performance is even worst. The training and validation accuracy stays at about 0.05, not learned at all. So reduce the learning rate from 0.01 to 0.007.
With learning rate 0.007, all the other conditions being the same as above, the performance is even worst. The training and validation accuracy stays at about 0.005, not learned at all. So reduce the learning rate from 0.007 to 0.001.
Per https://carnd-forums.udacity.com/questions/12619143/one-reason-for-low-accuracy-ill-conditioned-value-for-log-calculation, when the logits are too small, there might be numerical computation instability, to avoid that, add
logits = tf.clip_by_value(logits, 1e-10, 1.0)
below.
Adding numerical stability does not help at all. The accuracy does not improve at all, remaining at 0.008, and 0.009, respectively for training and validation. I suspect that there is some serious mistake in the implementation.
Change learning rate from 0.001 to 0.0007 to accomodate the drop-out scheme, with which there is compensation of additional weight changes to the weights not being dropped.
rate = 0.0007
logits = LeNet(x)
#logits = tf.clip_by_value(logits, 1e-10, 1.0) # added to improve numerical stability.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)
#predict = tf.argmax(logits, 1)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples
in_training = False
if in_training:
epochs = []
accuracies_training = []
accuracies_validation = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train)
print("Training...")
print()
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
sess.run(training_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 0.5})
training_accuracy = evaluate(X_train, y_train)
validation_accuracy = evaluate(X_validation, y_validation)
epochs.append(i)
accuracies_training.append(training_accuracy)
accuracies_validation.append(validation_accuracy)
print("EPOCH {} ...".format(i+1))
print("Training Accuracy = {:.3f}".format(training_accuracy))
print("Validation Accuracy = {:.3f}".format(validation_accuracy))
print()
saver.save(sess, 'lenet')
print("Model saved")
plt.plot(epochs, accuracies_training,'b.', label="Training")
plt.plot(epochs, accuracies_validation, 'r-', label="Validation")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.title("Learning Curve")
plt.legend(loc='best') # place legend to avoid overlapping with curves.
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
test_accuracy = evaluate(X_test, y_test)
print("Test Accuracy = {:.3f}".format(test_accuracy))
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer: Use the LeNet's implementation for now.
What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.
Answer:
With minimum adaptation from LeNet (just changing the input depth to 3, and the number of classes 43), with sufficient training epochs, the training accuracy reaches 1.0, and validation accuracy 0.946. This shows that LeNet fundamentally has the capacity to learn the task of classifying the traffic signs. It seems suffering from overfitting.
Recall, that there is much variations in the input images. Next, I decide to do zero-mean nomalization to see if it would help to improve on the validation accuracy. As the implemenatation of the normalization seems not triavial, and I realize that a color space adaptation would achieve the effect including the normalization.
So I decide to do the color adaptation (convolution) first.
Adding the color adaptation turned out to be helpful, the validation accuracy increases to 0.972, while the training accuracy 1.0.
Next, I'd like to try a more global approach of trying drop out.
With drop-out for the first fully connected layer. Again, the validation accuracy improves to 0.983, while training accuracy 0.998 at 300 EPOCHS.
Try drop-out at the next hidden layer, also with EPOCHS increased from 300 to 500. the validation accuracy stays at 0.983, while training accuracy still 0.998. It seems the additional drop-out does not help.
Next try to remove the drop-out at the first fully connected hidden layer, that is, switching the drop-out to the next hidden layer alone, but train at 500 EPOCHS. The resuls have slight improvement. The validation accuracy to 0.985, the training accuracy 1.0.
It seems that with drop-out, the learnirg rate may need to reduce to accomodate the compensation to the remaining weights. Try the combination of drop-out at the first fully connected hidden layer, but with smaller learning rate, as it seems to me that the dropping out at the next hidden layer may not be enough, as it has less weights/parameters. Yes, there is further improvement, the validation accuracy reaches 0.989, training 1.0.
Next, try to make the BATCH_SIZE large enough to accomodate full batch to be more accurate with gradient decent. The improvement may not be significant. Actually, changing BATCH_SIZE from 10000 to 40000, the cross-validation gets worsen, 0.978. There is bigger consistent gap between the training accuracy, and validation accuracy. It becomes 0.02 from 0.01. This is interesting, more perfect training result in less desirable generalization! Imperfect training might result in better generalization!
Next, try the reverse of decreasing the BATCH_SIZE to further confirm the understanding, all other parameters remain, except adjust the EPOCHS accordingly to keep the number of training constant. Yes, the smaller BATCH_SIZE = 5000 indeed helps to improve on cross-validation, reaching 0.987 at the EPOCHS 200.
Last try with BATCH_SIZE 100, EPOCHS 20 to see what would happen. The validation accuracy is 0.986, training accuracy 0.994. It looks promising, as the training has not yet satuated.
Next try to increase the EPOCHS to 40. No improvement, still 0.985.
Recover the best setting so far to confirm. BATCH_SIZE = 10000, EPOCHS = 400. 0.986.
With kernel1 size change from 5 to 6, all the other condition remains, 0.987.
With kernel size change to 7, all the other condition remains, 0.986.
With kernal size change to 4, all the other condition remains, 0.989, evon 0.990!
It seems that if an architecture or parameter is promising, then at the intial training, the validation might be even better than the training accuracy! Or the gap is very small, say, 0.001.
With kernal size change to 3, all the other condition remains, 0.990!
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Load the images and plot them here.
### Feel free to use as many code cells as needed.
import os
import matplotlib.image as mpimg
X_new = []
y_new = []
name_to_sign_id = {
'30-1': 1,
'30': 1,
'animal': 31,
'curve-1': 19,
'curve': 19,
'keep-right': 38,
'left': 34,
'no-entry': 17,
'stop-chinese': 14,
'stop-distorted': 14,
'stop': 14,
'stop1': 14,
'yield': 13
}
for file in os.listdir('./new-samples/'):
name, ext = os.path.splitext(file)
y_new.append(name_to_sign_id[name])
image = mpimg.imread(os.path.join('./new-samples/', file))
X_new.append(image)
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.
Answer:
a = [1, 2, 3]
b = [1, 2, 3]
a != b
a[a !=b]
b[a != b]
### Run the predictions here.
### Feel free to use as many code cells as needed.
def predictions(X_data, y_data, indices):
xx, yy = X_data[indices], y_data[indices]
sess = tf.get_default_session()
logits_out = sess.run(logits, feed_dict = {x: xx, y: yy, keep_prob: 1.0})
predicts = np.argmax(logits_out, 1)
error_boolean_idx = np.not_equal(predicts, yy)
xx_failed, predict_wrong, expected = xx[error_boolean_idx], predicts[error_boolean_idx], yy[error_boolean_idx]
by_expected = np.argsort(expected)
return xx_failed[by_expected], predict_wrong[by_expected], expected[by_expected]
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint('.'))
failed_X, wrong_predict, expected_y = predictions( X_validation, y_validation, slice(None, None, None) )
expected_y
wrong_predict
import matplotlib.gridspec as gridspec
import math
def disp_prediction_errors(X, predict, expected, class_dict, columns = 5):
cases = len(X)
rows = math.ceil(cases/columns)
gs1 = gridspec.GridSpec(rows, columns)
gs1.update(wspace=0.9, hspace=0.9) # set the spacing between axes.
plt.figure(figsize=(15,15))
for i in range(cases):
ax1 = plt.subplot(gs1[i])
#plt.subplot(rows,columns,i+1)
ax1.set_xticklabels([])
ax1.set_yticklabels([])
# ax1.set_aspect('equal')
ax1.set_title(str.format('Predicted: {}', class_dict[predict[i]]))
ax1.set_xlabel(str.format('Expected: {}', class_dict[expected[i]]))
plt.imshow(X[i])
#plt.axis('off')
plt.show()
disp_prediction_errors(failed_X, wrong_predict, expected_y, sign_id_to_name, columns=4)
plt.imshow(X_test[predictions != y_test, :, :, :][4])
p, t = predictions
np.shape(p)
np.shape(t)
np.shape(predictions)
Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.
NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.
Answer:
### Visualize the softmax probabilities here.
### Feel free to use as many code cells as needed.
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
tf.nn.top_k will return the values and indices (class ids) of the top k predictions. So if k=3, for each sign, it'll return the 3 largest probabilities (out of a possible 43) and the correspoding class ids.
Take this numpy array as an example:
# (5, 6) array
a = np.array([[ 0.24879643, 0.07032244, 0.12641572, 0.34763842, 0.07893497,
0.12789202],
[ 0.28086119, 0.27569815, 0.08594638, 0.0178669 , 0.18063401,
0.15899337],
[ 0.26076848, 0.23664738, 0.08020603, 0.07001922, 0.1134371 ,
0.23892179],
[ 0.11943333, 0.29198961, 0.02605103, 0.26234032, 0.1351348 ,
0.16505091],
[ 0.09561176, 0.34396535, 0.0643941 , 0.16240774, 0.24206137,
0.09155967]])
Running it through sess.run(tf.nn.top_k(tf.constant(a), k=3)) produces:
TopKV2(values=array([[ 0.34763842, 0.24879643, 0.12789202],
[ 0.28086119, 0.27569815, 0.18063401],
[ 0.26076848, 0.23892179, 0.23664738],
[ 0.29198961, 0.26234032, 0.16505091],
[ 0.34396535, 0.24206137, 0.16240774]]), indices=array([[3, 0, 5],
[0, 1, 4],
[0, 5, 1],
[1, 3, 5],
[1, 4, 3]], dtype=int32))
Looking just at the first row we get [ 0.34763842, 0.24879643, 0.12789202], you can confirm these are the 3 largest probabilities in a. You'll also notice [3, 0, 5] are the corresponding indices.
Answer:
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.